A Model of Vocabulary Partition

نویسندگان

  • Pierre Hubert
  • Dominique Labbé
  • Pierre HUBERT
چکیده

The model proposed here is used to describe the vocabulary of a corpus. It is divided into two groups: general vocabulary which is used whatever the circumstances and several local (or 'specialized') vocabularies, each of which is used in only one part of the corpus. General words may appear everywhere in the text and their increase with corpus length can be estimated with Muller's formula. In this model, a partition parameter measures the relative importance of both types of vocabularies: so the value of this parameter gives an estimation of the lexical 'specialization' in the text. This model has been applied to Racine's plays and can also be used to measure the increase of vocabulary with corpus length, to locate stylistic changes or to compare several texts from the point of view of their lexical richness. Résumé On propose un modèle destiné à décrire le vocabulaire d’un corpus. Il est divisé en deux groupes : le vocabulaire général, utilisé quelles que soient les circonstances, et de plusieurs vocabulaires locaux ou “spécialisés", utilisés uniquement dans une partie du corpus. Les mots appartenant au vocabulaire général apparaissent partout dans le texte et leur rythme d’apparition peut être estimé grâce à la formule de Muller. Un paramètre de partition mesure le poids relatif des deux vocabulaires. Ce paramètre donne donc une estimation de la spécialisation du vocabulaire dans un texte ou un corpus. Ce modèle est utilisé pour mesurer l’accroissement du vocabulaire avec l’allongement du corpus, pour localiser les ruptures thématiques et stylistiques dans ce corpus et pour comparer différents textes du point de vue de leur richesse lexicale. On présente une application aux pièces de Racine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model

Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...

متن کامل

An employee transporting problem

An employee transporting problem is described and a set partitioning model is developed. An investigation of the model leads to a knapsack problem as a surrogate problem. Finding a partition corresponding to the knapsack problem provides a solution to the problem. An exact algorithm is proposed to obtain a partition (subset-vehicle combination) corresponding to the knapsack solution. It require...

متن کامل

EXPERIMENTALl ANALYSIS OF PARTITION COEFFICIENT IN Al-Mg ALLOYS

Because the partition coefficient is one of the most important parameters affecting microsegregation, the aim of this research is to experimentally analyse the partition coefficient in Al-Mg alloys. In order to experimentally measure the partition coefficient, a series of quenching experiments during solidification were carried out. For this purpose binary Al-Mg alloys containing 6.7 and 10.2 w...

متن کامل

Scientific Discovery from the Perspective of Hypothesis Acceptance∗

A model of inductive inquiry is defined within the context of first-order logic. The model conceives of inquiry as a game between Nature and a scientist. To begin the game, a nonlogical vocabulary is agreed upon by the two players, along with a partition of a class of countable structures for that vocabulary. Next, Nature secretly chooses one structure (“the real world”) from some cell of the p...

متن کامل

Verifying LVCSR Output at Different Levels with Generalized Posterior Probability

Generalized posterior probability (GPP), a statistical confidence measure, is used for verification of large vocabulary continuous speech recognition (LVCSR) output at subword, word and utterance levels. GPP is obtained by combining exponentially and optimally weighted products of acoustic and language model scores for reappeared units in the reduced search space (e.g., word graph). Experimenta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017